Skip to content

Conversation

gchalump
Copy link
Contributor

Summary:
X-link: https://github.com/facebookresearch/FBGEMM/pull/1516

Re-land attempt of D75462895

Add TBE data configuration reporter to TBE forward call.

The reporter reports TBE data configuration at the SplitTableBatchedEmbeddingBagsCodegen forward call. The output is a TBEDataConfig object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

Just Knobs for enablement

Environment Variables


The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  • FBGEMM_REPORT_INPUT_PARAMS_INTERVAL:

    • Description: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
    • Example Value: 1 (report every iteration)
  • FBGEMM_REPORT_INPUT_PARAMS_ITER_START:

    • *Description: Specifies the start of the iteration range to capture reports. Default 0.
    • *Example Value: 0 (start reporting from the first iteration)
-   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
  -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
  -   ***Example Value**: `-1` (report until the last iteration)
  • FBGEMM_REPORT_INPUT_PARAMS_BUCKET:

    • Description: Specifies the name of the Manifold bucket where the report data will be saved.
    • Example Value: tlparse_reports
  • FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX:

    • Description: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    • Example Value: tree/tests/

Use Cases

  • FileStore
    • General
      • Auto-create output directories if not exist.
    • fb-internal:
      • Only export to manifold.
      • Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
    • OSS
      • Will use local FileStore to store the output

Example Usage


Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2

Explanation

The above setting will report iter 3 and iter 5

  • FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2: The reporter will generate a report every 2 iterations.
  • FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0: The reporter will start generating reports from the first iteration.
  • FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default): The reporter will continue to generate reports until the last iteration interval.
  • FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports: The reports will be saved in the tlparse_reports bucket.
  • FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/: The reports will be stored with the path prefix tree/tests/. For Manifold make sure all folders within the path exist.

Note on Benchmark example

Note that with the --iters 2 option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.



Other includes changes in this Diff:

  • Updates build dependency of tbe_data_config* files
  • Remove shutil and numpy.random lib as it cause uncompatiblity error.
  • Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D79758603

Copy link

netlify bot commented Aug 11, 2025

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
🔨 Latest commit b631308
🔍 Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68b623f55d1c1100074aad19
😎 Deploy Preview https://deploy-preview-4672--pytorch-fbgemm-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@meta-cla meta-cla bot added the cla signed label Aug 11, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79758603

gchalump added a commit to gchalump/FBGEMM that referenced this pull request Aug 12, 2025
Summary:
X-link: facebookresearch/FBGEMM#1703


X-link: facebookresearch/FBGEMM#1516


Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```


## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output


## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.


---
---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D79758603
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Aug 12, 2025
Summary:
X-link: facebookresearch/FBGEMM#1703


X-link: facebookresearch/FBGEMM#1516


Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```


## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output


## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.


---
---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D79758603
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79758603

gchalump added a commit to gchalump/FBGEMM that referenced this pull request Aug 12, 2025
Summary:
X-link: facebookresearch/FBGEMM#1703

Pull Request resolved: pytorch#4672

X-link: facebookresearch/FBGEMM#1516

Pull Request resolved: pytorch#4455

Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```

## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output

## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.

 ---
 ---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D79758603
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79758603

gchalump added a commit to gchalump/FBGEMM that referenced this pull request Aug 12, 2025
Summary:
X-link: facebookresearch/FBGEMM#1703

Pull Request resolved: pytorch#4672

X-link: facebookresearch/FBGEMM#1516

Pull Request resolved: pytorch#4455

Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```

## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output

## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.

 ---
 ---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D79758603
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Aug 14, 2025
Summary:
X-link: facebookresearch/FBGEMM#1703


X-link: facebookresearch/FBGEMM#1516


Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```


## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output


## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.


---
---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D79758603
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Aug 14, 2025
Summary:
X-link: facebookresearch/FBGEMM#1703


X-link: facebookresearch/FBGEMM#1516


Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```


## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output


## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.


---
---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D79758603
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79758603

gchalump added a commit to gchalump/FBGEMM that referenced this pull request Aug 14, 2025
Summary:
X-link: facebookresearch/FBGEMM#1703

Pull Request resolved: pytorch#4672

X-link: facebookresearch/FBGEMM#1516

Pull Request resolved: pytorch#4455

Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```

## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output

## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.

 ---
 ---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D79758603
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79758603

gchalump added a commit to gchalump/FBGEMM that referenced this pull request Aug 14, 2025
Summary:
X-link: facebookresearch/FBGEMM#1703

Pull Request resolved: pytorch#4672

X-link: facebookresearch/FBGEMM#1516

Pull Request resolved: pytorch#4455

Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```

## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output

## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.

 ---
 ---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D79758603
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Aug 18, 2025
Summary:
X-link: facebookresearch/FBGEMM#1703


X-link: facebookresearch/FBGEMM#1516


Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```


## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output


## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.


---
---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D79758603
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Aug 18, 2025
Summary:
X-link: facebookresearch/FBGEMM#1703


X-link: facebookresearch/FBGEMM#1516


Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```


## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output


## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.


---
---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D79758603
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79758603

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79758603

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79758603

gchalump added a commit to gchalump/FBGEMM that referenced this pull request Aug 25, 2025
Summary:
X-link: facebookresearch/FBGEMM#1703

Pull Request resolved: pytorch#4672

X-link: facebookresearch/FBGEMM#1516

Pull Request resolved: pytorch#4455

Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```

## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output

## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.

 ---
 ---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Differential Revision: D79758603
@gchalump gchalump force-pushed the export-D79758603 branch 2 times, most recently from 9a45a11 to 6bdf1b1 Compare August 27, 2025 19:36
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Aug 27, 2025
Summary:
X-link: facebookresearch/FBGEMM#1703


X-link: facebookresearch/FBGEMM#1516


Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```


## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output


## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.


---
---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Reviewed By: q10, spcyppt

Differential Revision: D79758603
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Aug 27, 2025
Summary:
X-link: facebookresearch/FBGEMM#1703


X-link: facebookresearch/FBGEMM#1516


Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```


## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output


## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.


---
---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Reviewed By: q10, spcyppt

Differential Revision: D79758603
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79758603

gchalump added a commit to gchalump/FBGEMM that referenced this pull request Aug 27, 2025
Summary:
X-link: facebookresearch/FBGEMM#1703

Pull Request resolved: pytorch#4672

X-link: facebookresearch/FBGEMM#1516

Pull Request resolved: pytorch#4455

Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```

## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output

## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.

 ---
 ---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Reviewed By: q10, spcyppt

Differential Revision: D79758603
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79758603

gchalump added a commit to gchalump/FBGEMM that referenced this pull request Aug 27, 2025
Summary:
X-link: facebookresearch/FBGEMM#1703

Pull Request resolved: pytorch#4672

X-link: facebookresearch/FBGEMM#1516

Pull Request resolved: pytorch#4455

Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```

## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output

## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.

 ---
 ---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Reviewed By: q10, spcyppt

Differential Revision: D79758603
@gchalump gchalump force-pushed the export-D79758603 branch 2 times, most recently from 92cdc2f to 218bfcd Compare September 1, 2025 22:43
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Sep 1, 2025
Summary:
X-link: facebookresearch/FBGEMM#1703


X-link: facebookresearch/FBGEMM#1516


Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```


## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output


## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.


---
---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Reviewed By: q10, spcyppt

Differential Revision: D79758603
gchalump added a commit to gchalump/FBGEMM that referenced this pull request Sep 1, 2025
Summary:
X-link: facebookresearch/FBGEMM#1703


X-link: facebookresearch/FBGEMM#1516


Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```


## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output


## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.


---
---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Reviewed By: q10, spcyppt

Differential Revision: D79758603
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79758603

gchalump added a commit to gchalump/FBGEMM that referenced this pull request Sep 1, 2025
Summary:
X-link: facebookresearch/FBGEMM#1703

Pull Request resolved: pytorch#4672

X-link: facebookresearch/FBGEMM#1516

Pull Request resolved: pytorch#4455

Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```

## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output

## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.

 ---
 ---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Reviewed By: q10, spcyppt

Differential Revision: D79758603
Summary:
X-link: facebookresearch/FBGEMM#1703

Pull Request resolved: pytorch#4672

X-link: facebookresearch/FBGEMM#1516

Pull Request resolved: pytorch#4455

Re-land attempt of D75462895

# Add TBE data configuration reporter to TBE forward call.
The reporter reports TBE data configuration at the `SplitTableBatchedEmbeddingBagsCodegen` ***forward*** call. The output is a `TBEDataConfig` object, which is written to a JSON file(s). The configuration of its environment variables and an example of its usage is described below.

## Just Knobs for enablement
 - fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS is added for enablement of the reporter (https://www.internalfb.com/intern/justknobs/?name=fbgemm_gpu%2Ffeatures)
    - Default is set to `False`, enable this flag to enable reporter.
    - To enable it locally use:
       ```
       jk canary set fbgemm_gpu/features:TBE_REPORT_INPUT_PARAMS --on --ttl 600
       ```

## Environment Variables
---------------------

The Reporter relies on several environment variables to control its behavior. Below is a description of each variable:

  -  **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL**:
      -  **Description**: Determines the interval at which reports are generated. This is specified in terms of the number of iterations.
      -   **Example Value**: `1` (report every iteration)

   -  **FBGEMM_REPORT_INPUT_PARAMS_ITER_START**:
      -   ***Description**: Specifies the start of the iteration range to capture reports. Default 0.
      -   ***Example Value**: `0` (start reporting from the first iteration)

    -   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END**:
      -   ***Description**: Specifies the end of the iteration range to capture reports. Use `-1` to report until the last iteration. Default -1.
      -   ***Example Value**: `-1` (report until the last iteration)

-   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET**:
    *   **Description**: Specifies the name of the Manifold bucket where the report data will be saved.
    *   **Example Value**: `tlparse_reports`

-  **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX**:
    -  **Description**: Defines the path prefix where the report files will be stored. Path will be created if not exist.
    -   **Example Value**: `tree/tests/`

## Use Cases
- FileStore
   - General
       - Auto-create output directories if not exist.
   - fb-internal:
     - Only export to manifold.
     - Assert error, if the flag is set but failed to initialize manifold connection. (missing backend or manifold bucket is not exist)
  - OSS
    - Will use local FileStore to store the output

## Example Usage
-------------

Below is an example command demonstrating how to use the FBGEMM Reporter with specific environment variable settings:

```
FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2  FBGEMM_REPORT_INPUT_PARAMS_ITER_START=3
FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/ buck2 run mode/opt //deeplearning/fbgemm/fbgemm_gpu/bench:split_table_batched_embeddings -- device --iters 2
```

**Explanation**

The above setting will report `iter 3` and `iter 5`

*   **FBGEMM_REPORT_INPUT_PARAMS_INTERVAL=2**: The reporter will generate a report every 2 iterations.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_START=0**: The reporter will start generating reports from the first iteration.
*   **FBGEMM_REPORT_INPUT_PARAMS_ITER_END=-1 (Default)**: The reporter will continue to generate reports until the last iteration interval.
*   **FBGEMM_REPORT_INPUT_PARAMS_BUCKET=tlparse_reports**: The reports will be saved in the `tlparse_reports` bucket.
*   **FBGEMM_REPORT_INPUT_PARAMS_PATH_PREFIX=tree/tests/**: The reports will be stored with the path prefix `tree/tests/`. For Manifold make sure all folders within the path exist.

**Note on Benchmark example**

Note that with the `--iters 2` option, the benchmark will execute 6 forward calls (2 iterations plus 1 warmup) for the forward benchmark and another 3 calls (2 iterations plus 1 warmup) for the backward benchmark. Iteration starts from 0.

 ---
 ---
## Other includes changes in this Diff:
  - Updates build dependency of tbe_data_config* files
  - Remove `shutil` and `numpy.random`  lib as it cause uncompatiblity error.
  - Add non-OSS test, writing extracted config data json file to Manifold

Reviewed By: q10, spcyppt

Differential Revision: D79758603
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D79758603

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 46d3ae2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants